An Algorithm to Extract Jamaican Geographic Locations from News Articles - Using NLP Techniques
نویسندگان
چکیده
Natural Language Processing (NLP) has long been used to extract information from large bodies of text. NLP is often used to intelligently parse large volumes of data where the manual alternative may be infeasible. Named Entity Recognition (NER) is used to extract named entities such as people, places or organizations from text written in natural language. Using NER, NLP algorithms can be created to extract the mentions of geographic locations of different types from current and archived news articles. This information can be used to add a spatial window into previously flat datasets, allowing users to access information by filtering location information. Information that is derived can be used to support intelligent decision making and influence expert systems. This paper describes the development of an algorithm that uses the principles of both NLP and NER to extract references to geographic locations within news articles. The algorithm has been developed using the NLTK and Pattern Web Toolkit for Python and performs with a precision and accuracy above eighty (80) percent.
منابع مشابه
News Visualization based on Semantic Knowledge
Due to the overwhelming amount of news articles from a growing number of sources, it has become nearly impossible for humans to select and read all articles that are relevant to get deep insights and form conclusions. This leads to a need for an easy way to aggregate and analyze news articles efficiently and visualize the garnered knowledge as a base for further cognitive processing. The presen...
متن کاملIdentifying Disputed Topics in the News
News articles often reflect an opinion or point of view, with certain topics evoking more diverse opinions than others. For analyzing and better understanding public discourses, identifying such contested topics constitutes an interesting research question. In this paper, we describe an approach that combines NLP techniques and background knowledge from DBpedia for finding disputed topics in ne...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملSemantic Similarities between Locations based on Ontology
Toponym disambiguation or location names resolution is a critical task in unstructured text, articles or documents. Our research explores how to link ambiguous locations mentioned in documents, news and articles with latitude/longitude coordinates. We designed an evaluation system for toponym disambiguation based on annotated GEOCLEF data. We implemented a node-based approach taking population ...
متن کاملUnsupervised Storyline Extraction from News Articles
Storyline extraction from news streams aims to extract events under a certain news topic and reveal how those events evolve over time. It requires algorithms capable of accurately extracting events from news articles published in different time periods and linking these extracted events into coherent stories. The two tasks are often solved separately, which might suffer from the problem of erro...
متن کامل